Baltimore Ceasefire 365 is a city-wide call asking Baltimore residents to avoid having any murders through quarterly Ceasefires and Peace Challenges (February, May, August, and November).

In this post, we use open data and R to look at the distribution of shootings in space and time, and model the impact of the Ceasefires.

Shootings in Baltimore

Baltimore releases detailed data on issues relevant to the city, including crime. Here’s the raw numbers of shootings, plotted for each day.

library(tidyverse)
library(scales)
library(knitr)

bpd <- read_csv("/Users/peterphalen/Documents/ceasefire/BPD_Part_1_Victim_Based_Crime_Data.csv")

# subset to shootings or homicides with a firearm
bpd <- subset(bpd, Description == "SHOOTING" |
                (Description == "HOMICIDE" & Weapon == "FIREARM"))

bpd$CrimeDate <- as.Date(bpd$CrimeDate, format = "%m/%d/%Y")

# there are many crimes per day. collapse to daily counts
daily <- bpd %>% group_by(CrimeDate) %>% summarise(shootings = n())

# fill missing dates, because some had no shootings
full.ts <- data.frame(CrimeDate = seq(daily$CrimeDate[1], 
                                      daily$CrimeDate[nrow(daily)], by="day"))
daily <- full_join(full.ts,daily)
daily <- daily %>% group_by(CrimeDate) %>% mutate_all(funs(ifelse(is.na(.),0,.)))

ggplot(daily) +
  aes(x=CrimeDate, y=jitter(shootings, .2)) +
  geom_point(alpha=.2) + 
  xlab("date") +
  ylab("daily shooting deaths (vertically jittered)") +
  scale_y_continuous(breaks=0:20) +
  scale_x_date(labels = date_format("%b %Y")) +
  ggtitle(" ", 
          subtitle="Baltimore (2012-present)")

Geography of shootings

You can tap neighborhoods to see exact numbers.

Raw counts

This map shows the raw count of murders in Baltimore since 2012 by neighborhood.

library(geojsonio)
library(leaflet)

bpd <- subset(bpd, !is.na(Neighborhood))

# count by neighobrhood
count <- bpd %>%
  group_by(Neighborhood) %>%
  summarise(total.count=n()) 

# get polygons to draw neighborhood maps
nbds <- geojsonio::geojson_read("/Users/peterphalen/Documents/ceasefire/Neighborhoods.geojson", what = "sp")

get_shooting_count <- function(neighborhood){
  nbd <- as.character(neighborhood)
  if(nbd %in% count$Neighborhood){
    count <- count[count$Neighborhood == nbd,]$total.count
    return(count)
  }
  if(!(nbd %in% count$Neighborhood)){
    return(0)
  }
}

nbds$count <- sapply(nbds$Name, get_shooting_count)

# draw legend
range.count <- range(nbds$count,na.rm=T)
labs <- c(0,50,100,150,200,250)
pal.crime <- colorNumeric(colorRamp(c('#ccccff', 'red')), labs)

leaflet(nbds) %>% 
  addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
  addPolygons(stroke=T,
              weight=1,
              popup=paste0(nbds$Name,"<br/>Shootings: ",nbds$count),
              color=~pal.crime(count),
              fillOpacity=.5) %>%
  addLegend("bottomright",title="# of shootings (2012-present)",colors=~pal.crime(labs),labels=~labs)

Population-adjusted map

This version adjusts by the population of each neighborhood. Be careful about the super bright red areas, because some of them have very few residents and so even a single shooting will make them look really dangerous even though they’re probably not. (You can tap neighborhoods to see the number of residents.)

#--------- population-adjusted --------------#
nbds$per1k <- nbds$count / nbds$Population * 1000
nbds$per1k <- round(nbds$per1k)
nbds$per1k <- ifelse(nbds$Population == 0, NA, nbds$per1k)
labs <- c(0,25,50,75)
pal.crime <- colorNumeric(colorRamp(c('#ccccff', 'red')), 
                          labs,
                          na.color = "#b2b2b2")

countlabel <- paste0(nbds$Name,"<br/>",nbds$count," shootings among ",nbds$Population," residents")
nbds$countlabel <- ifelse(nbds$Population == 0, paste0(nbds$Name,":<br/>","No residents"), countlabel)

leaflet(nbds) %>% #draw population-adjusted map,
                  #areas with 0 residents are greyed 
                  #out but can still be clicked
  addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
  addPolygons(stroke=T,
              weight=1,
              popup=nbds$countlabel,
              color=~pal.crime(per1k),
              fillOpacity=.6) %>%
  addLegend("bottomright",title="Shootings per one</br>thousand residents</br>(2012-present)",colors=~pal.crime(labs),labels=~labs)

Ceasefire weekends

Ceasefires have been called four times per year since August 2017. These are “ceasefire weekends” but their impact often extends well beyond a few days. One lasted twelve.

# first day (Friday) of ceasefire weekends
ceasefire.initial <- 
  as.Date(
  c("08/04/2017",
    "11/03/2017",
    "02/02/2018",
    "05/11/2018",
    "08/03/2018",
    "11/02/2018",
    "02/01/2019",
    "05/10/2019",
    "08/02/2019"),
      format="%m/%d/%Y")

ceasefire.weekends <- 
  lapply(ceasefire.initial,
         function(x){
           seq(from=x,
               by="day",
               length.out=3)
           }
         )
ceasefire.weekends <- do.call("c", 
                              ceasefire.weekends)

Statistical model

We want to include information about the date, the day of the week (Mon-Sun), the day of the year (1-365), and a binary variable indicating whether a date occurs during ceasefire.

library(lubridate)

# the julian calendar is a simple system for numeric dates
daily$jul <- julian(daily$CrimeDate)

daily$weekday <- factor(weekdays(daily$CrimeDate),
                        levels=c("Monday","Tuesday","Wednesday","Thursday",
                                 "Friday","Saturday","Sunday"))

daily$seasonal <- yday(daily$CrimeDate)

daily$day.of.year <- date_format("%b-%d")(daily$CrimeDate)


daily$ceasefire <- factor(ifelse(daily$CrimeDate %in% ceasefire.weekends, 1, 0),
                          labels=c("Regular Day","Ceasefire Weekend"))

We’ll predict shootings using: * spline time trend + splines are estimated to smooth noisy data * cyclical spline for yearly seasonality + the cyclical constraint requires the seasonal effect to begin where it ended * varying intercepts for day of the week + in case weekends have different patterns of shootings than weekdays * varying intercept for each day of the year + in case “special days”, like mothers day, show different patterns of shootings * binary indicator for the ceasefire

We use a Poisson link function because the outcome is a count and the mean is about the same as sd.

library(rstanarm)

model <- stan_gamm4(shootings ~ 
            s(jul) +
            s(seasonal, 
             bs="cc") + #cyclical constraint 
            ceasefire, 
           random= ~ (1 | weekday) + 
                     (1 | day.of.year), # every day....
           data=daily,
           cores=4,
           iter=2000,
           family=poisson)

Here is a plot of the model against the observations. The red line represents the average. The grey area is the 80% predictive interval, which means that about 80% of days will have shooting counts that fall within the grey area. 10% of the time, the number of shootings will extend above the grey area

Ceasefires are visible as eight dramatic downward red spikes beginning in 2017.

daily$Estimate <- apply(posterior_linpred(model, transform=TRUE),
                        2, median)

# 80% posterior predictive interval for main plot
preds <- posterior_predict(model, transform=TRUE)
preds <- apply(preds, 2, function(x){quantile(x, prob=c(.1, .9))})

daily$high <- preds["90%",]
daily$low <- preds["10%",]

daily %>% 
  ggplot(aes(x = CrimeDate, y = shootings)) +
  geom_point(alpha=.2) +
  geom_line(aes(y = Estimate), alpha=.5, color="red") +
  geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
  scale_y_continuous(breaks=c(0,4,8,12)) +
  xlab("date") +
  theme_bw()

Model components

We plot the marginal effects, showing the components that make up the above time series. The specific numbers of shootings are to some extent dependent upon the reference point, but these figures give you the right idea of the shape of the trends.

### Time trend plot
time.frame <- with(daily, # Ref: August
                  data.frame(
                    jul=jul,
                    weekday=0, # not used for this prediction
                    ceasefire="Regular Day",
                    day.of.year="Aug-01",
                    seasonal=yday(as.Date("2018-08-01"))))

post <- posterior_linpred(model,
                          newdata=time.frame,
                          transform=TRUE,
                          re.form = NA)
time.frame$Estimate <- apply(post,2, median)

# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
time.frame$low <- ci["2.5%",]
time.frame$high <- ci["97.5%",]

trend.axis.dates <- seq(from=as.Date("2012-01-01"),
                        by="year",
                        length.out=9)
time.plot <- 
  time.frame %>% 
  ggplot() +
  aes(x=jul, y=Estimate) +
  geom_line(aes(y = Estimate), alpha=.5) +
  geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
  xlab("Time trend") + 
  ylab("Shootings") +
  ggtitle(" ") +
  scale_x_continuous(
    breaks=julian(trend.axis.dates),
    labels=date_format("%m-%Y")(trend.axis.dates)) +
  theme_bw()

Here’s the deseasonalized time trend we just created in the above block of code. Shootings increased sometime in 2015 and have stayed high.

time.plot 

Now we’re going to construct all the seasonal graphs…

### day of year plot
doy.for.single.year <- date_format("%b-%d")(seq(as.Date("0000-01-01"), as.Date("0000-12-31"), by = 1))
exact.day.frame <- with(daily, # Ref: regular day in mid-2018
                data.frame(
                  jul=julian(as.Date("2018-08-01"))[1],
                  weekday=0, # weekday not used for this prediction
                  ceasefire="Regular Day",
                  doy.index = 1:length(doy.for.single.year),
                  day.of.year=doy.for.single.year,
                seasonal=100))

doy.samples <- as.matrix(model, regex_pars = " day.of.year") 

doy.effects <- apply(doy.samples,2, median)
# scale to an odds ratio by exponentiating
doy.effects <- exp(doy.effects)
exact.day.frame$Estimate <- as.vector(doy.effects)

# 95% credible intervals (exponentiated)
ci <- exp(apply(doy.samples,2,function(x){quantile(x, prob=c(.025, .975))}))
exact.day.frame$low <- ci["2.5%",]
exact.day.frame$high <- ci["97.5%",]

doy.axis.dates <- seq(as.Date("0-01-01"),by="month",length.out=12)

largest.day <- which.max(exact.day.frame$Estimate)
largest.day.row <- exact.day.frame[largest.day,]

lowest.day <- which.min(exact.day.frame$Estimate)
lowest.day.row <- exact.day.frame[lowest.day,]

xmas.day <- exact.day.frame[which(exact.day.frame$day.of.year=="Dec-25"),]

doy.plot <- 
  exact.day.frame %>% 
  ggplot() +
  aes(x=doy.index, y=Estimate) +
  geom_line(aes(y = Estimate), alpha=.5) +
  geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
  xlab("Day of year") + 
  ylab("Shootings\n(odds ratio)") +
  ggtitle(" ") +
  scale_x_continuous(
    breaks=yday(c(doy.axis.dates, as.Date("0-12-31"))),
    labels=date_format("%b-%d")(c(doy.axis.dates, as.Date("0-01-01")))
  ) +
  #highest day
  annotate("text", x = largest.day.row$doy.index, 
           y = largest.day.row$Estimate + .3, 
           label = paste("largest increase:",largest.day.row$day.of.year)) + 
  geom_segment(aes(
               xend = largest.day.row$doy.index, 
               x = largest.day.row$doy.index + 5,
               yend = largest.day.row$Estimate + .02,
               y = largest.day.row$Estimate + .26),
               arrow = arrow(length = unit(0.5, "cm"))) +
  # lowest day
  annotate("text", x = lowest.day.row$doy.index - 20, 
           y = lowest.day.row$Estimate - .25, 
           label = paste("largest decrease:",lowest.day.row$day.of.year)) + 
  geom_segment(aes(
               xend = lowest.day.row$doy.index, 
               x = lowest.day.row$doy.index - 6,
               yend = lowest.day.row$Estimate - .02,
               y = lowest.day.row$Estimate - .22),
               arrow = arrow(length = unit(0.5, "cm"))) +
  # christmas day
  annotate("text", x = xmas.day$doy.index - 55, 
           y = xmas.day$Estimate + .33, 
           label = "Christmas day (Dec 25th)") + 
  geom_segment(aes(
               xend = xmas.day$doy.index - 3, 
               x = xmas.day$doy.index - 40,
               yend = xmas.day$Estimate + .02,
               y = xmas.day$Estimate + .28),
               arrow = arrow(length = unit(0.5, "cm"))) +
  theme_bw()

### seasonal plot
seasonal.frame <- with(daily, # Ref: regular day in mid-2018
                data.frame(
                  jul=julian(as.Date("2018-08-01"))[1],
                  weekday=0, # weekday not used for this prediction
                  ceasefire="Regular Day",
                  day.of.year=0,
                  seasonal=1:365))

post <- posterior_linpred(model,
                           newdata=seasonal.frame,
                           transform=TRUE,
                          re.form = NA)
seasonal.frame$Estimate <- apply(post,2, median)

# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
seasonal.frame$low <- ci["2.5%",]
seasonal.frame$high <- ci["97.5%",]

doy.axis.dates <- seq(as.Date("0-01-01"),by="month",length.out=12)

seasonal.plot <- 
  seasonal.frame %>% 
  ggplot() +
  aes(x=seasonal, y=Estimate) +
  geom_line(aes(y = Estimate), alpha=.5) +
  geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
  xlab("Seasonal trend") + 
  ylab("Shootings") +
  ggtitle(" ") +
  scale_x_continuous(
    breaks=yday(c(doy.axis.dates, as.Date("0-12-31"))),
    labels=date_format("%b")(c(doy.axis.dates, as.Date("0-01-01")))
  ) +
  theme_bw()

### Day of week plot
wday.frame <- with(daily, # Ref: regular day in August 2018
                  data.frame(
                    jul=julian(as.Date("2018-08-01"))[1],
                    weekday=unique(daily$weekday),
                    ceasefire="Regular Day",
                    day.of.year=yday(as.Date("2018-08-01")),
                    seasonal=yday(as.Date("2018-08-01"))))

post <- posterior_linpred(model,
                          newdata=wday.frame,
                          transform=TRUE)
wday.frame$Estimate <- apply(post,2, median)

# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
wday.frame$low <- ci["2.5%",]
wday.frame$high <- ci["97.5%",]

wday.plot <- 
  wday.frame %>% 
  ggplot() +
  aes(x=weekday, y=Estimate) +
  geom_point(size=2) +
  geom_errorbar(aes(ymin=low, ymax=high),
                width=.2) +
  xlab("Day of week") + 
  ylab("Shootings") +
  ggtitle(" ") +
  theme_bw()

Here’s the day-of-year plot we just constructed. We don’t see a lot of evidence for day-of-year effects: the 95% credible intervals for all these days overlap. I’ve mared the highest and lowest days (after accounting for slower-moving seasonal trends). I’ve also marked Christmas day, because it was interesting to me.

doy.plot

And here are the 5 days with the lowest numbers of shootings (relative to their placement in the season).

day.effects.sorted <- exact.day.frame[order(exact.day.frame$Estimate),
                                        "day.of.year"]
kable(data.frame(five_lowest_days=head(day.effects.sorted,5)))
five_lowest_days
Dec-11
Jun-04
Apr-17
Nov-08
Apr-15

And here are the 5 days with highest numbers of shootings (relative to their placement in the season). Christmas day is the second highest point.

kable(data.frame(five_highest_days=rev(tail(day.effects.sorted,5))))
five_highest_days
Apr-22
Dec-25
Dec-04
Jan-28
May-28

We do see a strong seasonal effect, with summers showing up as particularly bad. We see the largest dips in shootings in February and March, cold and dark months in Baltimore.

seasonal.plot

There is essentially no weekday effect.

wday.plot

Effect of Ceasefire

Finally, we can use this model to measure the effect of Ceasefires on shootings per day, after accounting for all these trends and seasonalities. The effect of the Ceasefire (plotted here as an odds ratio with 95% credible intervals) is classically statistically significant and suggests an approximate 50% reduction in shootings during ceasefire weekends, after accounting for all other effects (e.g., day of the week, holidays, etc):

We can also use this model to see the impact of the ceasefire at specific points in time. For example, here is the model-estimated impact of the ceasefire on Friday August 2nd, 2019.

### Ceasefire plot
pred.day <- as.Date("2019-08-02")
ceasefire.frame <- with(daily, 
                  data.frame(
                    jul=julian(pred.day)[1],
                    weekday="Friday",
                    ceasefire=factor(c("Regular Day",
                                "Ceasefire Weekend"),
                                levels=c("Regular Day",
                                         "Ceasefire Weekend")),
                    day.of.year=yday(pred.day),
                    seasonal= yday(pred.day)))

post <- posterior_linpred(model,
                          newdata=ceasefire.frame,
                          transform=TRUE)
ceasefire.frame$Estimate <- apply(post,2, median)

# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.25, .75))})
ceasefire.frame$low <- ci["25%",]
ceasefire.frame$high <- ci["75%",]

# 50% posterior predictive interval for main plot
preds <- posterior_predict(model,
                          newdata=ceasefire.frame,
                          transform=TRUE)

ceasefire.frame$high.ppd <- apply(preds,2,function(x){quantile(x, prob=c(.75), na.rm=T)})
ceasefire.frame$low.ppd <- apply(preds,2,function(x){quantile(x, prob=c(.25), na.rm=T)})


ceasefire.frame %>% 
  ggplot() +
  aes(x=ceasefire, y=Estimate) +
  geom_point(aes(y = low.ppd), col="blue", shape=95, size=5) +
  geom_point(aes(y = high.ppd), col="blue", shape=95, size=5) +
  geom_point(aes(y = Estimate),
             size=2) +
  geom_errorbar(aes(ymin=low, ymax=high), 
                width=.2) +

  xlab("") + 
  ylab("Shootings") +
  ggtitle("Predicted shooting count for Friday August 8, 2019",
          subtitle="with 50% credible intervals (black) and posterior predictive intervals (blue)") +
  theme_bw()

Without a ceasefire, we’d expect about four people to get shot each day on average. But because this will be a Ceasefire weekend, the model expects about half that many to be killed.

Notice the little blue horizontal lines drawn at 1 and 3 for the ceasefire weekend estimate. Those marks represent the 50% window for the number of shootings you can expect on any given day. The fact that the lower tick mark rests at 1 and 3 means that there will be 0 shootings on a ceasefire day about 25% of the time, and >3 shootings about 25% of the time.

The ceasefire impact is real and at the same time, our model won’t be surprised by the occurrence of several shootings on each day of ceasefire weekend.